中文情感分析練習

2019鐵人賽

catxxx519

2018-11-13 23:38:34

6527 瀏覽

分享至

中文情感分析練習

這幾天找到中文的情感字典，與之前不一樣的是這是字典而不是語料庫，所以想說能練習看看

先把字典分別做成list

with open('../dict/台湾大学简体中文情感极性词典ntusd/ntusd-negative.txt', mode='r', encoding='utf-8') as f:
    negs = f.readlines()
with open('../dict/台湾大学简体中文情感极性词典ntusd/ntusd-positive.txt', mode='r', encoding='utf-8') as f:
    poss = f.readlines()
pos = []
for i in poss:
    a=re.findall(r'\w+',i) 
    pos.extend(a)
neg = []
for i in negs:
    a=re.findall(r'\w+',i) 
    neg.extend(a)

print(neg[:5])
['一下子爆发', '一下子爆发的一连串', '一巴掌', '一再', '一再叮嘱']
print(pos[:5])
['一帆风顺', '一帆风顺的', '一流', '一致', '一致的']

接著去除停用詞及分詞

import jieba
stop_words = [w.strips() for w in open('stop_words.txt').readlines()]
def sent2word(sentence,stop_words=stop_words):
   words = jieba.cut(sentence)
   words = [w for w in words if w not in stop_words]
   return words

爬文看到這時候還需要兩種字典